Algorithmic Energy Saving for Parallel Cholesky, LU, and QR Factorizations

نویسندگان

  • Li Tan
  • Zizhong Chen
چکیده

Slack is pervasive in runs of high performance applications, in the presence of various performance boosting solutions. The presence of slack provides ample opportunities for achieving energy efficiency for high performance computing nowadays. Regardless of communication slack, classic energy saving approaches for saving energy during the slack otherwise include race-to-halt and CP-aware slack reclamation, which reply on power scaling techniques to adjust processor power states judiciously during the slack. Existing efforts demonstrate CP-aware slack reclamation is superior to race-to-halt in energy saving capability. In this paper, we formally model our observation that the energy saving capability gap between the two approaches is significantly narrowed down on today’s processors, given the fact that state-of-the-art CMOS technologies allow insignificant variation of supply voltage as operating frequency of a processor scales. We also provide experimental evaluation for validation on a large-scale power-aware cluster.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs

We present performance results for dense linear algebra using the 8-series NVIDIA GPUs. Our matrix-matrix multiply routine (GEMM) runs 60% faster than the vendor implementation in CUBLAS 1.1 and approaches the peak of hardware capabilities. Our LU, QR and Cholesky factorizations achieve up to 80–90% of the peak GEMM rate. Our parallel LU running on two GPUs achieves up to ~300 Gflop/s. These re...

متن کامل

Fill-in reduction in sparse matrix factorizations using hypergraphs

We discuss partitioning methods using hypergraphs to produce fill-reducing orderings of sparse matrices for Cholesky, LU and QR factorizations. For the Cholesky factorization, we investigate a recent result on pattern-wise decomposition of sparse matrices, generalize the result, and develop algorithmic tools to obtain more effective ordering methods. The generalized results help us to develop f...

متن کامل

PoLAPACK: parallel factorization routines with algorithmic blocking

LU, QR, and Cholesky factorizations are the most widely used methods for solving dense linear systems of equations, and have been extensively studied and implemented on vector and parallel computers. Most of these factorization routines are implemented with blockpartitioned algorithms in order to perform matrix-matrix operations, that is, to obtain the highest performance by maximizing reuse of...

متن کامل

Design and Performance Modeling of Parallel Block Matrix Factorizations for Distributed Memory Multicomputers

EEcient and scalable parallel block algorithms for the LU factorization with partial pivoting, the Cholesky, and QR factorizations in a distributed memory multicomputer environment are presented. The distributed system is viewed as a ring of processors and the algorithms correspond to shared memory algorithms parallelized on block level (explicit parallelism). Performance of the algorithms are ...

متن کامل

Parallel Block Matrix Factorizations for Distributed Memory Multicomputers

EEcient and scalable parallel block algorithms for the LU factor-ization with partial pivoting, the Cholesky, and QR factorizations in a distributed memory multicomputer environment are presented. The distributed system is viewed as a ring of processors and the algorithms correspond to shared memory algorithms parallelized on block level (explicit parallelism). Performance of the algorithms are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014